Learning Tiers for Long-Distance Phonotactics
نویسنده
چکیده
Long-distance phonotactics, in which the surface sounds of a language are subject to coocurrence constraints referring to nonadjacent segments, present a difficult learning problem. To acquire such patterns, a learner must find dependencies among distant segments. This paper approaches an idealized version of this problem: how can these patterns even be learned at all? By introducing a learning algorithm for a class of patterns that fits the typology of long-distance phonotactics, this paper shows that it is possible to induce a tier over which long-distance generalizations can be made. A priori knowledge of a specific tier is shown to be unnecessary; the phonological notions of a tier and locality are enough to discover a tier and long-distance dependencies. The mechanisms of the algorithm thus represent a theoretical step towards a model of how children might acquire phonotactics from raw language data. Long-distance dependencies have challenged previously proposed learning algorithms. For example, the learning algorithm presented in Hayes &Wilson (2008) cannot find dependencies among vowels in Shona [±ATR] harmony until it is told a priori to ignore the consonants. However, they admit that this fix is not sufficient for patterns requiring complex tiers, such as consonantal harmony, disharmony, or vowel harmony with neutral vowels. Given the variety of long-distance phonotactic dependencies there are in natural language phonology, it is entirely possible that humans are endowed with a mechanism for filtering through ‘ignorable’ material in order to find phonotactic generalizations. The current paper introduces a provably correct algorithm for such a mechanism, based in formal language theory and grammatical inference (de la Higuera, 2010). The algorithm is designed for a particular class of formal languages known as the Tier-based Strictly 2-Local (TSL2) languages (Heinz et al., 2011), which are governed by grammars that check the adjacency of symbols on a ‘tier’ ignoring all symbols not on that tier. This algorithm, the TSL2 Learning Algorithm, can induce a tier from positive data with no a priori knowledge of what that tier is. In brief, it does this by starting with the tier equal to the full inventory of phonemes, and then removing symbols irrelevant to the pattern one-by-one, each time making new generalizations based on what it has previously removed from the tier. This aspect of the algorithm makes it an interesting model of learning. The formal details of this algorithm, including the proof of its correctness, can be found in Jardine & Heinz (in prep.). The purpose of this paper is to informally discuss the core insight of the algorithm, present results applying it to natural language data, and discuss its relation to the general problem of phonological acquisition. Two simple case studies of Latin liquid dissimilation and of Finnish vowel harmony, in which the learner successfully induces the correct generalization (given particular conditions), are presented. The paper is structured as follows. §2 gives the relevant background on long-distance phonotactics, issues in learning long-distance phonotactics, and the TSL2 formal languages. §3 presents the algorithm, and §4 illustrates the algorithm in use with the two case studies. Issues with the learner and how it connects with acquisition are presented in §5, and §6 concludes.
منابع مشابه
A Maximum Entropy Model of Phonotactics and Phonotactic Learning
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle o...
متن کاملA Maximum Entropy Model of Phonotactics and
The study of phonotactics (e.g., the ability of English speakers to distinguish possible words like blick from impossible words like *bnick) is a central topic in phonology. We propose a theory of phonotactic grammars and a learning algorithm that constructs such grammars from positive evidence. Our grammars consist of constraints that are assigned numerical weights according to the principle o...
متن کاملLearning unattested languages
This paper demonstrates the role of morphological alternations in learning novel phonotactic patterns. In an artificial grammar learning task, adult learners were exposed to a phonotactic pattern in which the first and last consonant agreed in voicing. Long-distance phonotactics encoded as strictly piecewise languages suggest that first-last phonotactic patterns should be unattested in natural ...
متن کاملLearning Long Distance Phonotactics
Two questions regarding the non-local nature of long-distance agreement in consonantal harmony patterns (Hansson 2001, Rose and Walker 2004) are addressed: (1) How can such patterns be learned from surface forms alone? (2) How can we understand a a major feature of the typology—the absence of blocking effects? It is shown that a learner which generalizes only by making distinctions with respect...
متن کاملThe Error-driven Ranking Model of the Early Stage of the Acquisition of Phonotactics: an Initial Result on Restrictiveness *
Nine-month-old infants are already sensitive to the distinction between licit and illicit forms (Jusczyk et al. 1993). They thus display knowledge of the target adult phonotactics at an early stage when morphology is plausibly still lagging behind (Hayes 2004) and the acquisition of the native language lexicon has barely begun (Fenson et al. 1994). How can this early stage of the acquisition of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015